ImageGear Professional v18.4 > User Guide > Using ImageGear > Optical Character Recognition > Output > Exporting to a Formatted Output Document |
The ImageGear Recognition document API allows saving recognized data to a number of document formats, such as RTF, Microsoft Office Word, or Excel.
This API group requires IG_REC_FEATURE_FORMATTED_OUTPUT to be enabled. |
After having successfully recognized the image (or a series of images), create an HIG_REC_DOCUMENT object for accumulating recognized pages and writing them to the final output document. Use IG_REC_document_create function to create an empty HIG_REC_DOCUMENT object. Then use IG_REC_document_page_insert function to insert recognized pages to the document. Recognition document API also allows you to remove, update, or reorder pages. You can also save the document into an intermediate file in the native data format, and reopen it later using IG_REC_document_save and IG_REC_document_open function, respectively. If the document is no longer needed, it must be closed with IG_REC_document_close function.
When a page has been added to the document, the document gets ownership of the recognized data, and the page object becomes invalid. If you need to re-recognize the image that has been added to a document, re-import it from HIGEAR again. You can then recognize it and update the corresponding page in the document using IG_REC_document_page_update. |
When all document pages have been recognized, you can output the final document using IG_REC_document_write function. Code Page, format of the final output document, and the level of format retention should be specified beforehand, using the IG_REC_output_codepage_set, IG_REC_output_format_set, and IG_REC_output_level_set functions. The full list of supported output formats is given in the topic Output Text Format List.
Use IG_REC_output_format_first_get and IG_REC_output_format_next_get functions to get full list of the supported output formats.
Enumerating the Available Output Text Formats
C |
Copy Code
|
---|---|
AT_ERRCOUNT nErrCount; AT_CHAR szFormatName[128]; nErrCount = IG_REC_output_format_first_get((LPSTR)szFormatName, sizeof(szFormatName) ); while(nErrCount == 0) { printf("%s\n", szFormatName); nErrCount = IG_REC_output_format_next_get((LPSTR)szFormatName, sizeof(szFormatName) ); if(nErrCount == 0) { nErrCount = IG_warning_check(); } } |
Recognizing a Multi-Page Document
C |
Copy Code
|
---|---|
AT_ERRCOUNT nErrCount; AT_INT i; AT_INT nPageCount; HIGEAR hIGear; HIG_REC_IMAGE hImg; HIG_REC_DOCUMENT hDocument; LPSTR szFile = "Multipage.tif"; nErrCount = IG_REC_document_create("MULTIPAG.RDO", &hDocument); nErrCount = IG_page_count_get(szFile, &nPageCount); for (i = 0; i < nPageCount; i++) { nErrCount = IG_fltr_load_file(szFile, i + 1, &hIGear ); nErrCount = IG_REC_image_import(hIGear, &hImg); nErrCount = IG_image_delete(hIGear); nErrCount = IG_REC_image_preprocess(hImg); nErrCount = IG_REC_image_recognize(hImg); nErrCount = IG_REC_document_page_insert(hDocument, hImg, -1); } // Specifies the file format for the final output document nErrCount = IG_REC_output_codepage_set("Windows ANSI"); nErrCount = IG_REC_output_format_set("Converters.Text.Word97"); // Save the recognized pages as MS Word97 nErrCount = IG_REC_document_write(hDocument, "MULTIPAG.DOC"); // Close the document nErrCount = IG_REC_document_close(hDocument); |
In this example the application specifies the output format (MS Word 97) for the final output document (MULTIPAG.DOC) with the IG_REC_output_format_set() call. Then the multi-page result of the recognition stored in the recognition data file ("MULTIPAG.RDO") is converted into the requested format with the IG_REC_document_write() function call. The customer should delete "MULTIPAG.RDO" file when it is no longer needed.
When the IG_REC_document_create function is called with the first parameter equal to NULL, the application doesn't have to deal with the recognition data file: the recognition component handles it internally (i.e., the default recognition data file is used automatically and deleted when the document is closed).